Watching movies in the original language is a popular and effective method to get pumped when learning foreign languages. It is important to choose a film that suits the student's level of difficulty, so that the student understands 50-70% of the dialogues. To meet this condition, the instructor must watch the film and decide what level it corresponds to. This requires a time commitment from the instructor; student and instructor tastes are not always the same.
project goal: develop an ML solution to automatically determine the difficulty level of English-language movies based on their subtitles. We will develop a classification for these films based on their difficulty level.
load modules
[nltk_data] Downloading package stopwords to [nltk_data] C:\Users\Wang\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date!
dict_keys(['10 Cloverfield lane', '10 things I hate about you', 'Aladdin', 'All dogs go to heaven', 'An American tail', 'A knights tale', 'A star is born', 'Babe', 'Back to the future', 'Batman begins', 'Beauty and the beast', 'Before I go to sleep', 'Before sunrise', 'Before sunset', 'Braveheart', 'Bridget Jones diary', 'Cast away', 'Catch me if you can', 'Clueless', 'Deadpool', 'Die hard', 'Dredd', 'Dune', 'Eurovision song contest ', 'Fight club', 'Finding Nemo', 'Forrest Gump', 'Good Will Hunting', 'Groundhog day', 'Harry Potter and the philosophers stone', 'Her', 'Home alone', 'Hook', 'House of Gucci', 'Inside out', 'It s a wonderful life', 'Knives out', 'Kubo and the two strings', 'Liar liar', 'Lion', 'Logan', 'Love actually', 'Mamma Mia', 'Mary Poppins returns', 'Matilda', 'Meet the parents', 'Moulin Rouge', 'Mrs Doubtfire', 'My big fat Greek wedding', 'Notting Hill', 'Pirates of the Caribbean', 'Pleasantville', 'Powder', 'Pulp fiction', 'Ready or not', 'Shrek', 'Sleepless in Seattle', 'Soul', 'The blind side', 'The break-up', 'The cabin in the woods', 'The fault in our stars', 'The graduate', 'The greatest showman', 'The hangover', 'The holiday', 'The invisible man', 'The jungle book', 'The kings speech', 'The lion king', 'The lord of the rings', 'The man called Flintstone', 'The secret life of Walter Mitty', 'The Shawshank redemption', 'The social network', 'The terminal', 'The terminator', 'The theory of everything', 'The usual suspects', 'Titanic', 'Toy story', 'Twilight', 'Up', 'Venom', 'Warm bodies', 'We are the Millers'])
From the dictionary we then create a dataframe with each row represents one movie
| subs | |
|---|---|
| 10 Cloverfield lane | Fixed Synced by bozxphd. Enjoy The Flick BEN O... |
| 10 things I hate about you | Hey! I'll be right with you. So, Cameron. Here... |
| Aladdin | Oh, I come from a land From a faraway place Wh... |
| All dogs go to heaven | CAPTIONING MADE POSSIBLE BY MGM HOME ENTERTAIN... |
| An American tail | MAMA Tanya, Fievel? Will you stop that twirlin... |
| ... | ... |
| Twilight | I'd never given much thought to how I would di... |
| Up | Movietown News presents Spotlight on Adventure... |
| Venom | Life Foundation Control, this is LF1. The spec... |
| Warm bodies | What am I doing with my life? I'm so pale. I s... |
| We are the Millers | Oh, my God... ...it's full on double rainbow a... |
86 rows × 1 columns
| subs | subs_lemma | |
|---|---|---|
| 10 Cloverfield lane | Fixed Synced by bozxphd. Enjoy The Flick BEN O... | fix sync bozxphd enjoy flick ben phone michell... |
| 10 things I hate about you | Hey! I'll be right with you. So, Cameron. Here... | right cameron go nine school year army brat en... |
| Aladdin | Oh, I come from a land From a faraway place Wh... | come land faraway place caravan camel roam fla... |
| All dogs go to heaven | CAPTIONING MADE POSSIBLE BY MGM HOME ENTERTAIN... | captioning make possible mgm home entertainmen... |
| An American tail | MAMA Tanya, Fievel? Will you stop that twirlin... | tanya fievel stop twirl twirl time bed come ha... |
| ... | ... | ... |
| Twilight | I'd never given much thought to how I would di... | never give much thought die die place someone ... |
| Up | Movietown News presents Spotlight on Adventure... | movietown news present spotlight adventure wit... |
| Venom | Life Foundation Control, this is LF1. The spec... | life foundation control lf specimen secure hea... |
| Warm bodies | What am I doing with my life? I'm so pale. I s... | life pale get eat well posture terrible stand ... |
| We are the Millers | Oh, my God... ...it's full on double rainbow a... | full double rainbow across sky whoa god whoo m... |
86 rows × 2 columns
one example of the subtitles of 10 things i hate about you after the abovie mentioned lemmatization.
"right cameron go nine school year army brat enough sure find padua different old school little ass wipe brain everywhere excuse say right office anymore get deviant see novel finish thank thank lot patrick verona see make visit weekly ritual moment together hit light clever kangaroo boy say expose cafeteria joke lunch lady optimist next time keep pouch michael eckman suppose show around thank god know normally send one audio visual geek know mean michael put slide michael cameron breakdown get basic beautiful people listen unless talk first bother wait rule watch eat see left coffee kid costa rican butthead edgy make sudden movement around delusional white rasta big marley fan think black semi political mostly smoke lot weed guy wait wait let guess cowboy close come cow mcdonald mcdonald future mvys ivy league accept yuppie greed back friend guy close bogey yesterday god happen bogey lozenstein start buy izod outlet mall kick hostile takeover worry pay group even think group bianca stratford sophomore burn pine perish course know beautiful deep sure see difference like love like skecher love prada backpack love skecher prada backpack listen forget incredibly uptight father widely know fact stratford sister allow date whatever everyone think sun also rise love romantic romantic hemingway abusive alcoholic misogynist squander half life hang around picasso try nail leftover oppose bitter self righteous hag friend pipe chachi guess society male asshole make worthy time sylvia plath charlotte bronte simone de beauvoir oppressive patriarchal value dictate education good mr morgan chance could get kat take midol come class someday go get bitch slap go thing stop kat want thank point view know difficult must overcome year upper middle class suburban oppression must tough next time storm crusade well lunch meat whatever white girl complain ask buy book write black man right mon even get start two anything else go office piss mr morgan later undulate desire adrienne remove red crimson cape sight reginald stiff judith another word engorge look okay swollen turgid tumescent perfect hear terrorize mr morgan class express opinion terrorist action way express opinion bobby ridgezay way testicle retrieval operation go quite well case interested still maintain kick ball point kat cat people perceive somewhat tempestuous heinous bitch term use often might want work thank always thank excellent guidance let get back reginald quiver member quiver member like virgin alert favorite lookin good lady reach even one reach wanna put money money get go fun guy joey donner jerk model model model mostly regional stuff rumor big tube sock ad come really really man look always vapid say totally conceite talk think look way smile look eye man totally pure miss cameron snotty little princess wear strategically plan sun dress make guy like realize never touch guy like joey realize want friend spend rest life put spank bank move move wrong mean know spanking part rest wrong right wrong wanna take shot guest actually look french tutor serious perfect speak french hey little rambo look kat read last month cosmo run along know overwhelm whelm everjust whelm think europe lady szeet young thing like ride careful leather charming new development disgusting remove head sphincter drive minor encounter shrew girlfriend sister bianca sister mewling rampalian wretch stay cool bro see later look ball katarina make anyone cry today sadly daddy precious nowhere say sarah lawrence get honey great sarah lawrence side country thus basis appeal think decide go stay go school dub like husky decide pick leave let us hope ask bianca drive home kat change drove drive home get upset daddy boy flame imbecile think might ask think know go ask think know answer always two house rule number one dating till graduate number two dating till graduate daddy unfair right want know unfair morning deliver set twin year old girl know say crack whore make skeezy boyfriend wear condom close say listen father well say dope focus second girl school date sister date intend see unwashed miscreant go school come planet loser oppose planet look look solve one old rule new rule bianca date mutant never date never date like get sleep night deep slumber father whose daughter impregnate talk sarah lawrence later fine wait daddy get go find blind deaf retard take movie one date sorry look like witty repartee joey eat donner suck make quick roxanne corinne andrew jarrett incredibly horrendous public break quad think start pronunciation right hacking gag spit part alternative french food could eat together saturday night ask cute name cameron listen know let date think french class wait minute curtis cameron come new rule date sister kid let ask like sailing 'cause read place rent boat beaucoup problemo calvin case hear sister particularly hideous breed loser notice little antisocial unsolved mystery use really popular like get sick something theory abound pretty sure incapable human interaction plus bitch sure know lot guy mind go difficult woman mean know people jump airplane ski cliff like extreme dating think could find someone extreme sure hell mean know could look gather group guy could perfect padua fine interested date katarina stratford never rip maybe last two people alive sheep sheep tell pointless one go look criminal hear light state trooper fire year san quentin well least horny serious man whack sell liver black market new set speaker guy listen later get date kat know mean could pay money need backer someone money stupid peach fruit roll see many right lose actually come chat chat actually think run idea see interested well hear want bianca right go sister insane head case one go right conversation purpose think need need hire guy go someone scare easily guy hear eat live duck everything beak foot clearly solid investment walk hall say say get cool association think get involve relax relax let pretend call shot busy set thing time bianca good idea right dick face remember guy grip rip great duck last night know see girl kat stratford want go sure sparky look take sister kat start date see whack get rule girl touching story really problem willing make problem provide generous compensation go pay take chick much twenty buck fine let us think go movie buck get popcorn want raisinet right look buck negotiation take leave trailer park fifty buck get deal fabio great practice everybody good hustle stratford thank mr chapin girlie sweat like pig actually way get guy attention mission life obviously strike fancy see work world make sense pick friday right friday night take place never like eleven broadway even know name screw know lot think doubtful doubtful screw want hear defeatist attitude want hear upbeat screw go coach chapin run bogey ever consider new look mean seriously could definite potential bury hostility hostile annoyed try people know think forget care people think always want know happen like adore thank get pearl mom hide three year daddy find drawer last week go start wear like come back claim besides look good trust ride vintage fender follow laundromat see car come say big talker depend topic fender really whip verbal frenzy afraid afraid well people maybe afraid sure think naked transparent want need baby baby asshole day mind bitch whoop whoop insurance cover pms tell seizure sarah lawrence punish want stay close home punish mom leave think could leave fine stop make decision father right want matter know want know want till even get old use want go east coast school want trust make choice want stop try control life control know want continue later wait maim joey car look like go take bus fact completely psycho manage escape attention daddy shell expect result watch bitch violate car count date get get get price hundred buck date advance forget well forget sister well hope smooth think verona go go go know try kat stratford right plan help situation man cameron majorjone bianca stratford chick beer flavor nipple think speak correctly say cameron love pure purer say joey donner cash donner plow whoever want plowing patrick pat let explain something set whole thing cameron get girl cameron joey pawn two go help tame wild beast absolutely research find like guy mean strictly non prison movie type way let us start friday night bogey lozenstein party perfect opportunity perfect opportunity take kat think little payback go party let us really important one like well think like white shirt well pensive damn go thoughtful go bogey lozenbrau thing friday night might good 'cause know go bother see right hear bogey lowenstein party really really really want go know unless sister know work far know go guy lang fan find picture jared leto drawer pretty sure harbor sex tendency kind guy like like pretty guy know ever hear say die date guy smoke right smoking else ask investigate inner working sister twisted mind think nothing else work need go behind enemy line go class schedule reading list date book concert ticket concert ticket black pantie tell want sex someday could like color buy black lingerie unless want someone see see room girl room personal bike think bar look like touch anything may get hepatitis get little insight complicated girl excuse one question start drink alcohol liver nothing nothing right first thing kat hate smoker tell non smoker another problem bianca say kat like pretty guy tell pretty guy pretty gorgeous guy sure know right like thai food feminist prose angry girl music indie rock persuasion list cd room suppose buy noodle book sit around listen chick play instrument right ever club skunk favorite band playing tomorrow night see club skunk right get ticket assail ear one night pair black underwear help could hurt right verona need agua two water plan ask might well get mind kind ruin surround usual cloud smoke know quit apparently bad think know guy bikini kill raincoat bad know raincoat watch never see look sexy come bogey party never give see use window daddy go well must know small study group friend otherwise know orgy mr stratford party hell sauna know anything party people expect kat go go normal define normal bogey lozenstein party normal bogey lozenstein bogey party lame excuse idiot school drink beer rub hope distract pathetic emptiness meaningless meaningless consumer drive live meaningless consumer drive life forjust one night forget completely wretched sister come kat fine make appearance start party daddy want wear belly daddy night around living room minute understand full weight decision perfectly aware listen every time even think kiss boy want picture wear halter top completely unbalanced go right wait minute drinking drug kissing tattoo piercing ritual animal slaughter kind god give idea daddy right early whatever drive knock sister bianca say right wear kenneth cole dress think mix genre right fact notice direct listen really mean something tell part already think time stop self involve one minute look look like great uncle milton think lose tie maybe right nervous also excited nervous excited mixed know right calm right last party go chuck cheese want talk fun good time remember guy touch anything tell must nigel brie know think get tercel toyota dual side air bag spacious back seat kiss kiss good thank man sweet look fresh tonight pussycat wait hairline recede going away sister stay away sister stay away sister guarantee stay away fight fight guy take outside thank kat look find bianca wait address public something need tell busy enjoy adolescence scamper want one right sister look place getting trash man suppose party know say want funny one later lord dance heather bite keep tie see around anywhere relax relax fine follow love bianca cameron know chastity think art together right neat really look amazing thank know look amazing bianca let us go congregate around mr cuervo see around get sears catalog thing go tube sock gig go huge hemorrhoid cream ad next week know sound kind bogus get act see underwear show bathing suit one see difference right show guy party sudden suck really really thank kat let one one mine man get act like human right go see fine fine come need lie lie go sleep sleep good concussion come sit sit need talk little busy right give second whole thing talk never want want joey whole time cameron like girl worth trouble think know see first joey half man secondly let anyone ever make feel like deserve want go come patronizing leave use big word smash think kat tell may concussion care never wake sure start take girl actually like like could find one see need affection blind hatred let sit right let get joey hate choose perfect revenge mainline tequila know say nope say kat come wake look listen kat open eye eye little green know go bunch go jarrett ready home minute know home till one chance man damn shame wanna go sure chastity pass bitch fun tonight ton cameron think could give ride home start band install car stereo start band father love strike type ask father permission think know get thing people know scary picnic pain ass want somebody bianca bianca offense anything mean know everyone dig sister without know vile think maybe another time never want go sail actually say always selfish know 'cause beautiful mean treat people like matter mean really like defend people call conceite help ask learn french blow could back game kat lady szay rhythm heart dance cozgirl kat babe oze table dance right give damn everybody weekend know maybe ask kat unless kick crap dumb butt wanna hear let us open book page sonnet listen faith love thee mine eye thee thousand error note tis heart love despise despite view pleas'd dote know shakespeare dead white guy knozs overlook want write version sonnet opinion everything want iambic pentameter go fight think really good assignment messin really look forward write get class get thank mr morgan shut cool picture collar keep lick stitch kid know fan shakespeare fan involve could refrain heart love heart courage make love know macbeth right listen friend like friend anything drunk remember plan work care think want kiss car sorry dweeb putz sorry right talk get scoop say hate fire thousand sun direct quote thank michael comforting know could need day cool maybe two imagine go antiquated mating ritual date really want get dress drakkar noir wear dexter boner feel force listen band definition blozs right right go get dress look entirely wrong perspective make statement goody something new different cupid joey concentrate awfully hard consider gym class help want talk prom look know deal go kat go sister go since let us say take care take care flozer limo tux everything make sure get prom know sick play little game wait wait wait sick let us say excuse see feminine mystique lose copy hear poetry read charming wholesome unwelcome mean think know badass think someone still pantie twist one minute think effect whatsoever pantie effect upchuck reflex nothing right still piss sweet love renew thy force say like people hear look embarrass girl sacrifice altar dignity even score listen say like people hear good true take eye like heaven touch wanna hold much long last love arrive thank god l'm alive good true take eye love baby quite right need baby warm lonely night love baby trust say pretty baby bring pray pretty baby find stay let love baby let love look pretty nervous sir szeate like pig sir eye bloodshot sir get pot confiscate mr chapin talk second stratford idea improve girl soccer team great let us talk later window know really big game hillcr high bicep huge god one even big take steroid hear steroid severely disintegrate package think package point let us hope point kick butt every year think devise plan enable finally defeat thing teach thing misdirection teach siegfrie roy anyway important think look leave run right bang score win get look leave like see plan go go show plan someone else thank enough help sneak detention cool problem think sure bust climb window tell ya keep distract dazzle wit excuse act way like people expect live people expectation instead disappoint start cover right something like screze never disappoint right come none stuff true state trooper fallacy dead guy parking lot duck hearsay bobby ridgezay ball fact deserve try grope lunch line fair enough accent real live australia pygmy close mom last year know porn career lie tell something true something true hate pea something real something one else knozs szeet sexy completely hot amazingly self assure anyone ever tell tell every day actually go prom request command come go want stupid tradition come people expect go push need motive want tell need therapy know anyone ever tell anszer question patrick nothing nothing pleasure company wait wait minute page seven good daddy honey like discuss tomorrow night know prom prom kat date think fool know wanna bend rule hot rod joey hot rod sister go go end story let us review kat interested die go know happen prom daddy dance kiss come home quite crisis situation imagine kiss think happen get nezs kissing keep elbozs placenta day long two second ignore fact severely unhinge discuss need night teenage normalcy normal damn dazson river kid sleep bed whatnot daddy get nezs ya get go get jiggy boy care dope ride mama raise fool thank bill ridiculous amount love across nation worldwide believe true story seattle come listen know know hate sit home suzy high school like care care firm believer something reason someone else wish luxury know sophomore got ask go prom go feel like joey never tell go ninth month like babe hate joey happened tell joke right mom leave everyone afterwards tell want anymore ready got piss dump szore never anything everyone else since exception bogey party stunning digestive pyrotechnic possible know warn tell anyone cheerleading squad find tiny dick okay tell want let make mind help daddy hold hostage stupid enough repeat mistake guess think protect let experience anything experience good bianca always trust people want guess never know lady thin hair bald spot solve problem instantly paint cover lt amazing powder cling tiny hair head lt actually build leave great great look hair hair system expensive interesting order hair package go prom funny szeetie lt instantly cover bald spot leave great look hair prom dress seem hear word lot lately daddy stop turn explain remember say could date kat date find guy actually kind perfect perfect cameron ask go prom really really wanna go since kat go guess alloze base aforementioned rule previous stipulation course meet let us go know every cop town bucko good get tux last minute something know lie around get dress something know lie around listen really sorry question motive wrong forgive ready prom ma'am mr stratford joey pick bianca see william ask meet mandella tell progress full hallucination milady good sir god call favor know think sophomore prom joey pick congratulation generous princess know joey like one reason even bet go friend go nail tonight milwaukee last year jail know marilyn manson sleep spice girl think see grandpa ill spend year couch watching wheel offortune make spaghettio end story way bianca cheese dick pay take kat little punk could snake bianca nothing hath hitteth faneth joey pal compadre mess wrong guy go pay little bitch right enough cross line come get little punk bianca shoot nose spray ad tomorrow make date bleed sister okay never well give chance pay take one person truly hate know set kat like like payment bonus sleep care money care care think want thank sure want go sail fun fine look know ever thank go last night really mean lot glad ready see later hope sister go meet biker big one full sperm funny tell dance hoppin part part part bianca beat hell guy bianca matter upset rub impressed father like admit daughter capable run life mean become spectator bianca still let play inning bench year go sarah lawrence even able watch game go boy tell change mind already send check right assume everyone find time complete poem except mr donner excuse shaft lose glass right anyone brave enough read aloud lord go hate way talk way cut hair hate way drive car hate stare hate big dumb combat boot way read mind hate much make sick even make rhyme hate hate way always right hate lie hate make laugh even bad make cry hate around fact call mostly hate way hate even close even little bit even fender strat think could use know start band besides extra cash know asshole pay take really great girl right screze fall really every day find girl flash someone get detention god buy guitar every time screw know know always drum bass maybe even one day tambourine think offense know everyone dig sister without know vile think suck let us go messin really look forward go see perky go perky go perky second one perky perky right away perky perky perky beginning shot perk bianca let us go congregate around mr cuervo see around worry well right come want long mess wrong guy go pay little bitch right enough cross line kid drive pick tune car want coffee could get prophylactic prophylactic let go could set like god want completely damage send therapy forever want lady shall go office"
create features (numbers) from texts, so we can build our machine learning model
| subs | subs_lemma | tok_cnt | uniq_tok_cnt | sent_len | sent_len_std | sent_cnt | tok_len | tok_len_std | |
|---|---|---|---|---|---|---|---|---|---|
| 10 Cloverfield lane | Fixed Synced by bozxphd. Enjoy The Flick BEN O... | fix sync bozxphd enjoy flick ben phone michell... | 2007 | 759 | 5.84 | 4.93 | 891 | 3.78 | 1.91 |
| 10 things I hate about you | Hey! I'll be right with you. So, Cameron. Here... | right cameron go nine school year army brat en... | 3542 | 1254 | 5.95 | 5.41 | 1488 | 3.75 | 1.91 |
| Aladdin | Oh, I come from a land From a faraway place Wh... | come land faraway place caravan camel roam fla... | 3578 | 1169 | 5.54 | 6.42 | 1533 | 3.75 | 1.89 |
| All dogs go to heaven | CAPTIONING MADE POSSIBLE BY MGM HOME ENTERTAIN... | captioning make possible mgm home entertainmen... | 3460 | 909 | 4.54 | 4.49 | 1717 | 3.77 | 1.81 |
| An American tail | MAMA Tanya, Fievel? Will you stop that twirlin... | tanya fievel stop twirl twirl time bed come ha... | 2185 | 678 | 4.79 | 5.85 | 1100 | 3.75 | 1.81 |
The oxford 3000 dictionary by CEFR levle has 4563 words
| subs | subs_lemma | tok_cnt | uniq_tok_cnt | sent_len | sent_len_std | sent_cnt | tok_len | tok_len_std | a1_% | a2_% | b1_% | b2_% | c1_% | other_% | other_word | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 Cloverfield lane | Fixed Synced by bozxphd. Enjoy The Flick BEN O... | fix sync bozxphd enjoy flick ben phone michell... | 2007 | 759 | 5.84 | 4.93 | 891 | 3.78 | 1.91 | 1261 | 242 | 121 | 48 | 32 | 303 | sync bozxphd flick ben michelle michelle mich... |
| 10 things I hate about you | Hey! I'll be right with you. So, Cameron. Here... | right cameron go nine school year army brat en... | 3542 | 1254 | 5.95 | 5.41 | 1488 | 3.75 | 1.91 | 2074 | 351 | 165 | 96 | 75 | 781 | cameron year brat padua ass anymore deviant p... |
| Aladdin | Oh, I come from a land From a faraway place Wh... | come land faraway place caravan camel roam fla... | 3578 | 1169 | 5.54 | 6.42 | 1533 | 3.75 | 1.89 | 1717 | 442 | 308 | 96 | 81 | 934 | faraway caravan camel roam barbaric hop arabi... |
| All dogs go to heaven | CAPTIONING MADE POSSIBLE BY MGM HOME ENTERTAIN... | captioning make possible mgm home entertainmen... | 3460 | 909 | 4.54 | 4.49 | 1717 | 3.77 | 1.81 | 1831 | 391 | 142 | 85 | 58 | 953 | captioning mgm itchy tap yow itchy idgi idgi ... |
| An American tail | MAMA Tanya, Fievel? Will you stop that twirlin... | tanya fievel stop twirl twirl time bed come ha... | 2185 | 678 | 4.79 | 5.85 | 1100 | 3.75 | 1.81 | 1314 | 184 | 119 | 37 | 29 | 502 | tanya fievel twirl twirl hanukkah hanukkah fi... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Twilight | I'd never given much thought to how I would di... | never give much thought die die place someone ... | 3820 | 1108 | 5.50 | 5.93 | 1775 | 3.68 | 1.81 | 2367 | 483 | 171 | 79 | 53 | 667 | phoenix erratic harebrained okay renee thorn ... |
| Up | Movietown News presents Spotlight on Adventure... | movietown news present spotlight adventure wit... | 2545 | 865 | 4.86 | 4.59 | 1270 | 3.78 | 1.96 | 1386 | 281 | 133 | 73 | 33 | 639 | movietown witness civilized world america lur... |
| Venom | Life Foundation Control, this is LF1. The spec... | life foundation control lf specimen secure hea... | 3390 | 1047 | 5.01 | 4.40 | 1740 | 3.72 | 1.90 | 1881 | 400 | 142 | 145 | 78 | 744 | foundation lf roger lf reentry reentry shit l... |
| Warm bodies | What am I doing with my life? I'm so pale. I s... | life pale get eat well posture terrible stand ... | 1873 | 639 | 4.93 | 3.99 | 1014 | 3.63 | 1.86 | 1154 | 261 | 81 | 46 | 39 | 292 | posture straighter jesus wish anymore hoodie ... |
| We are the Millers | Oh, my God... ...it's full on double rainbow a... | full double rainbow across sky whoa god whoo m... | 6439 | 1609 | 4.88 | 4.21 | 3295 | 3.63 | 1.80 | 3607 | 763 | 295 | 144 | 116 | 1514 | rainbow whoa whoo vivid triple rainbow streak... |
86 rows × 16 columns
| subs | subs_lemma | tok_cnt | uniq_tok_cnt | sent_len | sent_len_std | sent_cnt | tok_len | tok_len_std | a1_% | a2_% | b1_% | b2_% | c1_% | other_% | other_word | lex_div | lemma_div | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 Cloverfield lane | Fixed Synced by bozxphd. Enjoy The Flick BEN O... | fix sync bozxphd enjoy flick ben phone michell... | 2007 | 759 | 5.84 | 4.93 | 891 | 3.78 | 1.91 | 1261 | 242 | 121 | 48 | 32 | 303 | sync bozxphd flick ben michelle michelle mich... | 0.231 | 0.378 |
| 10 things I hate about you | Hey! I'll be right with you. So, Cameron. Here... | right cameron go nine school year army brat en... | 3542 | 1254 | 5.95 | 5.41 | 1488 | 3.75 | 1.91 | 2074 | 351 | 165 | 96 | 75 | 781 | cameron year brat padua ass anymore deviant p... | 0.204 | 0.354 |
| Aladdin | Oh, I come from a land From a faraway place Wh... | come land faraway place caravan camel roam fla... | 3578 | 1169 | 5.54 | 6.42 | 1533 | 3.75 | 1.89 | 1717 | 442 | 308 | 96 | 81 | 934 | faraway caravan camel roam barbaric hop arabi... | 0.212 | 0.327 |
| All dogs go to heaven | CAPTIONING MADE POSSIBLE BY MGM HOME ENTERTAIN... | captioning make possible mgm home entertainmen... | 3460 | 909 | 4.54 | 4.49 | 1717 | 3.77 | 1.81 | 1831 | 391 | 142 | 85 | 58 | 953 | captioning mgm itchy tap yow itchy idgi idgi ... | 0.157 | 0.263 |
| An American tail | MAMA Tanya, Fievel? Will you stop that twirlin... | tanya fievel stop twirl twirl time bed come ha... | 2185 | 678 | 4.79 | 5.85 | 1100 | 3.75 | 1.81 | 1314 | 184 | 119 | 37 | 29 | 502 | tanya fievel twirl twirl hanukkah hanukkah fi... | 0.214 | 0.310 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Twilight | I'd never given much thought to how I would di... | never give much thought die die place someone ... | 3820 | 1108 | 5.50 | 5.93 | 1775 | 3.68 | 1.81 | 2367 | 483 | 171 | 79 | 53 | 667 | phoenix erratic harebrained okay renee thorn ... | 0.173 | 0.290 |
| Up | Movietown News presents Spotlight on Adventure... | movietown news present spotlight adventure wit... | 2545 | 865 | 4.86 | 4.59 | 1270 | 3.78 | 1.96 | 1386 | 281 | 133 | 73 | 33 | 639 | movietown witness civilized world america lur... | 0.220 | 0.340 |
| Venom | Life Foundation Control, this is LF1. The spec... | life foundation control lf specimen secure hea... | 3390 | 1047 | 5.01 | 4.40 | 1740 | 3.72 | 1.90 | 1881 | 400 | 142 | 145 | 78 | 744 | foundation lf roger lf reentry reentry shit l... | 0.184 | 0.309 |
| Warm bodies | What am I doing with my life? I'm so pale. I s... | life pale get eat well posture terrible stand ... | 1873 | 639 | 4.93 | 3.99 | 1014 | 3.63 | 1.86 | 1154 | 261 | 81 | 46 | 39 | 292 | posture straighter jesus wish anymore hoodie ... | 0.213 | 0.341 |
| We are the Millers | Oh, my God... ...it's full on double rainbow a... | full double rainbow across sky whoa god whoo m... | 6439 | 1609 | 4.88 | 4.21 | 3295 | 3.63 | 1.80 | 3607 | 763 | 295 | 144 | 116 | 1514 | rainbow whoa whoo vivid triple rainbow streak... | 0.151 | 0.250 |
86 rows × 18 columns
| title | subs | subs_lemma | tok_cnt | uniq_tok_cnt | sent_len | sent_len_std | sent_cnt | tok_len | tok_len_std | ... | b1_% | b2_% | c1_% | other_% | other_word | lex_div | lemma_div | tree_height | tree_height_std | max_tree_height | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10 Cloverfield lane | Fixed Synced by bozxphd. Enjoy The Flick BEN O... | fix sync bozxphd enjoy flick ben phone michell... | 2007 | 759 | 5.84 | 4.93 | 891 | 3.78 | 1.91 | ... | 121 | 48 | 32 | 303 | sync bozxphd flick ben michelle michelle mich... | 0.231 | 0.378 | 3.03 | 1.56 | 11 |
| 1 | 10 things I hate about you | Hey! I'll be right with you. So, Cameron. Here... | right cameron go nine school year army brat en... | 3542 | 1254 | 5.95 | 5.41 | 1488 | 3.75 | 1.91 | ... | 165 | 96 | 75 | 781 | cameron year brat padua ass anymore deviant p... | 0.204 | 0.354 | 3.01 | 1.55 | 14 |
| 2 | Aladdin | Oh, I come from a land From a faraway place Wh... | come land faraway place caravan camel roam fla... | 3578 | 1169 | 5.54 | 6.42 | 1533 | 3.75 | 1.89 | ... | 308 | 96 | 81 | 934 | faraway caravan camel roam barbaric hop arabi... | 0.212 | 0.327 | 2.82 | 1.52 | 10 |
| 3 | All dogs go to heaven | CAPTIONING MADE POSSIBLE BY MGM HOME ENTERTAIN... | captioning make possible mgm home entertainmen... | 3460 | 909 | 4.54 | 4.49 | 1717 | 3.77 | 1.81 | ... | 142 | 85 | 58 | 953 | captioning mgm itchy tap yow itchy idgi idgi ... | 0.157 | 0.263 | 2.59 | 1.33 | 9 |
| 4 | An American tail | MAMA Tanya, Fievel? Will you stop that twirlin... | tanya fievel stop twirl twirl time bed come ha... | 2185 | 678 | 4.79 | 5.85 | 1100 | 3.75 | 1.81 | ... | 119 | 37 | 29 | 502 | tanya fievel twirl twirl hanukkah hanukkah fi... | 0.214 | 0.310 | 2.59 | 1.38 | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 81 | Twilight | I'd never given much thought to how I would di... | never give much thought die die place someone ... | 3820 | 1108 | 5.50 | 5.93 | 1775 | 3.68 | 1.81 | ... | 171 | 79 | 53 | 667 | phoenix erratic harebrained okay renee thorn ... | 0.173 | 0.290 | 2.82 | 1.36 | 9 |
| 82 | Up | Movietown News presents Spotlight on Adventure... | movietown news present spotlight adventure wit... | 2545 | 865 | 4.86 | 4.59 | 1270 | 3.78 | 1.96 | ... | 133 | 73 | 33 | 639 | movietown witness civilized world america lur... | 0.220 | 0.340 | 2.70 | 1.42 | 10 |
| 83 | Venom | Life Foundation Control, this is LF1. The spec... | life foundation control lf specimen secure hea... | 3390 | 1047 | 5.01 | 4.40 | 1740 | 3.72 | 1.90 | ... | 142 | 145 | 78 | 744 | foundation lf roger lf reentry reentry shit l... | 0.184 | 0.309 | 2.72 | 1.39 | 10 |
| 84 | Warm bodies | What am I doing with my life? I'm so pale. I s... | life pale get eat well posture terrible stand ... | 1873 | 639 | 4.93 | 3.99 | 1014 | 3.63 | 1.86 | ... | 81 | 46 | 39 | 292 | posture straighter jesus wish anymore hoodie ... | 0.213 | 0.341 | 2.67 | 1.32 | 9 |
| 85 | We are the Millers | Oh, my God... ...it's full on double rainbow a... | full double rainbow across sky whoa god whoo m... | 6439 | 1609 | 4.88 | 4.21 | 3295 | 3.63 | 1.80 | ... | 295 | 144 | 116 | 1514 | rainbow whoa whoo vivid triple rainbow streak... | 0.151 | 0.250 | 2.69 | 1.36 | 10 |
86 rows × 22 columns
consider the following univsersal POS tag
'UH',
'PRP',
'VBP',
'RB',
'VBG',
'DT',
'NNP',
'NN',
'CC',
'WP',
'JJ',
'IN',
'RP',
'PRP$',
'NNS',
'VBZ',
'JJS',
'VBN',
'CD',
'TO',
'VB',
'WRB',
'VBD',
'MD',
'POS',
'JJR',
'WDT',
'NNPS',
'RBR',
'FW',
'EX',
'PDT',
'RBS',
'LS',
'WP$']
We will count their appearances for each movie
| UH | PRP | VBP | RB | VBG | DT | NNP | NN | CC | WP | ... | WDT | NNPS | RBR | FW | EX | PDT | RBS | LS | WP$ | pos_tag_entrophy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 156 | 797 | 291 | 462 | 127 | 405 | 180 | 515 | 136 | 85 | ... | 17 | 6 | 5 | 0 | 17 | 6 | 1 | 1 | 0 | 0.818 |
| 1 | 322 | 1386 | 624 | 759 | 175 | 638 | 384 | 909 | 172 | 125 | ... | 23 | 2 | 4 | 1 | 17 | 9 | 1 | 0 | 1 | 0.809 |
| 2 | 350 | 1138 | 449 | 662 | 108 | 693 | 577 | 913 | 144 | 101 | ... | 6 | 1 | 13 | 5 | 15 | 13 | 10 | 2 | 2 | 0.813 |
| 3 | 180 | 783 | 349 | 350 | 93 | 552 | 2226 | 718 | 159 | 106 | ... | 6 | 7 | 2 | 15 | 11 | 5 | 1 | 1 | 0 | 0.736 |
| 4 | 278 | 636 | 286 | 456 | 87 | 397 | 459 | 568 | 126 | 61 | ... | 13 | 8 | 4 | 4 | 26 | 1 | 2 | 2 | 0 | 0.816 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 81 | 379 | 1640 | 633 | 918 | 194 | 605 | 414 | 891 | 170 | 122 | ... | 19 | 8 | 7 | 1 | 17 | 16 | 5 | 0 | 0 | 0.809 |
| 82 | 321 | 849 | 323 | 477 | 101 | 439 | 464 | 641 | 124 | 52 | ... | 7 | 4 | 3 | 1 | 11 | 13 | 3 | 0 | 0 | 0.811 |
| 83 | 486 | 1354 | 629 | 762 | 208 | 588 | 475 | 778 | 158 | 127 | ... | 24 | 2 | 4 | 4 | 18 | 7 | 3 | 1 | 0 | 0.812 |
| 84 | 195 | 828 | 399 | 490 | 111 | 322 | 173 | 429 | 86 | 64 | ... | 13 | 5 | 6 | 2 | 14 | 8 | 0 | 0 | 0 | 0.807 |
| 85 | 1052 | 2278 | 1039 | 1333 | 369 | 1240 | 866 | 1654 | 289 | 258 | ... | 23 | 17 | 10 | 4 | 16 | 15 | 0 | 0 | 1 | 0.808 |
86 rows × 36 columns
['JJS', 'CD', 'WRB', 'POS', 'JJR', 'WDT', 'NNPS', 'RBR', 'FW', 'EX', 'PDT', 'RBS', 'LS', 'WP$']
We merge the pos tags cnt dataframe with our previous movie dataframe.
| title | subs | subs_lemma | tok_cnt | uniq_tok_cnt | sent_len | sent_len_std | sent_cnt | tok_len | tok_len_std | ... | RP | PRP$ | NNS | VBZ | VBN | TO | VB | VBD | MD | pos_tag_entrophy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10 Cloverfield lane | Fixed Synced by bozxphd. Enjoy The Flick BEN O... | fix sync bozxphd enjoy flick ben phone michell... | 2007 | 759 | 5.84 | 4.93 | 891 | 3.78 | 1.91 | ... | 58 | 83 | 110 | 173 | 76 | 105 | 360 | 248 | 79 | 0.818 |
| 1 | 10 things I hate about you | Hey! I'll be right with you. So, Cameron. Here... | right cameron go nine school year army brat en... | 3542 | 1254 | 5.95 | 5.41 | 1488 | 3.75 | 1.91 | ... | 95 | 163 | 194 | 300 | 102 | 138 | 644 | 243 | 164 | 0.809 |
| 2 | Aladdin | Oh, I come from a land From a faraway place Wh... | come land faraway place caravan camel roam fla... | 3578 | 1169 | 5.54 | 6.42 | 1533 | 3.75 | 1.89 | ... | 86 | 187 | 194 | 261 | 128 | 167 | 718 | 138 | 194 | 0.813 |
| 3 | All dogs go to heaven | CAPTIONING MADE POSSIBLE BY MGM HOME ENTERTAIN... | captioning make possible mgm home entertainmen... | 3460 | 909 | 4.54 | 4.49 | 1717 | 3.77 | 1.81 | ... | 48 | 100 | 169 | 269 | 57 | 79 | 487 | 180 | 79 | 0.736 |
| 4 | An American tail | MAMA Tanya, Fievel? Will you stop that twirlin... | tanya fievel stop twirl twirl time bed come ha... | 2185 | 678 | 4.79 | 5.85 | 1100 | 3.75 | 1.81 | ... | 49 | 103 | 142 | 141 | 41 | 62 | 448 | 108 | 109 | 0.816 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 81 | Twilight | I'd never given much thought to how I would di... | never give much thought die die place someone ... | 3820 | 1108 | 5.50 | 5.93 | 1775 | 3.68 | 1.81 | ... | 101 | 191 | 201 | 350 | 86 | 201 | 769 | 339 | 228 | 0.809 |
| 82 | Up | Movietown News presents Spotlight on Adventure... | movietown news present spotlight adventure wit... | 2545 | 865 | 4.86 | 4.59 | 1270 | 3.78 | 1.96 | ... | 80 | 158 | 132 | 240 | 72 | 88 | 559 | 112 | 140 | 0.811 |
| 83 | Venom | Life Foundation Control, this is LF1. The spec... | life foundation control lf specimen secure hea... | 3390 | 1047 | 5.01 | 4.40 | 1740 | 3.72 | 1.90 | ... | 72 | 137 | 198 | 319 | 98 | 185 | 639 | 194 | 147 | 0.812 |
| 84 | Warm bodies | What am I doing with my life? I'm so pale. I s... | life pale get eat well posture terrible stand ... | 1873 | 639 | 4.93 | 3.99 | 1014 | 3.63 | 1.86 | ... | 52 | 77 | 115 | 164 | 55 | 102 | 411 | 133 | 113 | 0.807 |
| 85 | We are the Millers | Oh, my God... ...it's full on double rainbow a... | full double rainbow across sky whoa god whoo m... | 6439 | 1609 | 4.88 | 4.21 | 3295 | 3.63 | 1.80 | ... | 185 | 299 | 352 | 545 | 139 | 243 | 1202 | 383 | 222 | 0.808 |
86 rows × 44 columns
Requirement already satisfied: textstat in c:\users\wang\anaconda3\lib\site-packages (0.7.3) Requirement already satisfied: pyphen in c:\users\wang\anaconda3\lib\site-packages (from textstat) (0.12.0) Note: you may need to restart the kernel to use updated packages.
| title | subs | subs_lemma | tok_cnt | uniq_tok_cnt | sent_len | sent_len_std | sent_cnt | tok_len | tok_len_std | ... | difficult_words | linsear_write_formula | gunning_fog | text_standard | fernandez_huerta | szigriszt_pazos | gutierrez_polini | crawford | gulpease_index | osman | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10 Cloverfield lane | Fixed Synced by bozxphd. Enjoy The Flick BEN O... | fix sync bozxphd enjoy flick ben phone michell... | 2007 | 759 | 5.84 | 4.93 | 891 | 3.78 | 1.91 | ... | 290 | 3.500000 | 3.80 | 3rd and 4th grade | 127.19 | 124.16 | 54.29 | -0.3 | 86.2 | 91.98 |
| 1 | 10 things I hate about you | Hey! I'll be right with you. So, Cameron. Here... | right cameron go nine school year army brat en... | 3542 | 1254 | 5.95 | 5.41 | 1488 | 3.75 | 1.91 | ... | 548 | 3.153846 | 4.04 | 2nd and 3rd grade | 126.78 | 123.17 | 54.31 | -0.2 | 84.4 | 91.92 |
| 2 | Aladdin | Oh, I come from a land From a faraway place Wh... | come land faraway place caravan camel roam fla... | 3578 | 1169 | 5.54 | 6.42 | 1533 | 3.75 | 1.89 | ... | 431 | 58.000000 | 3.97 | 5th and 6th grade | 126.58 | 122.91 | 54.30 | 0.0 | 83.1 | 92.02 |
| 3 | All dogs go to heaven | CAPTIONING MADE POSSIBLE BY MGM HOME ENTERTAIN... | captioning make possible mgm home entertainmen... | 3460 | 909 | 4.54 | 4.49 | 1717 | 3.77 | 1.81 | ... | 284 | 2.625000 | 3.47 | 2nd and 3rd grade | 126.78 | 125.83 | 55.51 | -0.4 | 85.3 | 95.44 |
| 4 | An American tail | MAMA Tanya, Fievel? Will you stop that twirlin... | tanya fievel stop twirl twirl time bed come ha... | 2185 | 678 | 4.79 | 5.85 | 1100 | 3.75 | 1.81 | ... | 169 | 2.000000 | 3.45 | 1st and 2nd grade | 126.99 | 123.47 | 54.85 | -0.2 | 85.4 | 94.00 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 81 | Twilight | I'd never given much thought to how I would di... | never give much thought die die place someone ... | 3820 | 1108 | 5.50 | 5.93 | 1775 | 3.68 | 1.81 | ... | 438 | 2.714286 | 3.38 | 2nd and 3rd grade | 127.90 | 125.08 | 54.90 | -0.6 | 89.8 | 93.65 |
| 82 | Up | Movietown News presents Spotlight on Adventure... | movietown news present spotlight adventure wit... | 2545 | 865 | 4.86 | 4.59 | 1270 | 3.78 | 1.96 | ... | 304 | 8.285714 | 3.60 | 5th and 6th grade | 127.39 | 124.63 | 54.46 | -0.4 | 87.8 | 93.04 |
| 83 | Venom | Life Foundation Control, this is LF1. The spec... | life foundation control lf specimen secure hea... | 3390 | 1047 | 5.01 | 4.40 | 1740 | 3.72 | 1.90 | ... | 452 | 2.437500 | 3.70 | 2nd and 3rd grade | 127.70 | 123.75 | 54.91 | -0.4 | 88.6 | 94.17 |
| 84 | Warm bodies | What am I doing with my life? I'm so pale. I s... | life pale get eat well posture terrible stand ... | 1873 | 639 | 4.93 | 3.99 | 1014 | 3.63 | 1.86 | ... | 231 | 1.944444 | 3.50 | 1st and 2nd grade | 127.50 | 126.09 | 55.78 | -0.6 | 89.1 | 96.32 |
| 85 | We are the Millers | Oh, my God... ...it's full on double rainbow a... | full double rainbow across sky whoa god whoo m... | 6439 | 1609 | 4.88 | 4.21 | 3295 | 3.63 | 1.80 | ... | 608 | 1.789474 | 3.22 | 1st and 2nd grade | 127.90 | 126.38 | 55.72 | -0.7 | 90.8 | 96.27 |
86 rows × 60 columns
In the target dataframe
In the df dataframe
| title | level | subtitles | kinopoisk | |
|---|---|---|---|---|
| 39 | Lie to me (series) | B1,B2 | No | NaN |
| 47 | Moulin Rouge 🎙️ | A2/A2+,B1 | No | NaN |
| 80 | The Walking Dead (series)🧟 | A2/A2+ | No | NaN |
movie 'moulin rouge' has subtitles
we then merge the target df with previous movie dataframe
(86, 4)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 86 entries, 0 to 85 Data columns (total 61 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 title 86 non-null object 1 subs 86 non-null object 2 subs_lemma 86 non-null object 3 tok_cnt 86 non-null int64 4 uniq_tok_cnt 86 non-null int64 5 sent_len 86 non-null float64 6 sent_len_std 86 non-null float64 7 sent_cnt 86 non-null int64 8 tok_len 86 non-null float64 9 tok_len_std 86 non-null float64 10 a1_% 86 non-null int64 11 a2_% 86 non-null int64 12 b1_% 86 non-null int64 13 b2_% 86 non-null int64 14 c1_% 86 non-null int64 15 other_% 86 non-null int64 16 other_word 86 non-null object 17 lex_div 86 non-null float64 18 lemma_div 86 non-null float64 19 tree_height 86 non-null float64 20 tree_height_std 86 non-null float64 21 max_tree_height 86 non-null int32 22 UH 86 non-null int64 23 PRP 86 non-null int64 24 VBP 86 non-null int64 25 RB 86 non-null int64 26 VBG 86 non-null int64 27 DT 86 non-null int64 28 NNP 86 non-null int64 29 NN 86 non-null int64 30 CC 86 non-null int64 31 WP 86 non-null int64 32 JJ 86 non-null int64 33 IN 86 non-null int64 34 RP 86 non-null int64 35 PRP$ 86 non-null int64 36 NNS 86 non-null int64 37 VBZ 86 non-null int64 38 VBN 86 non-null int64 39 TO 86 non-null int64 40 VB 86 non-null int64 41 VBD 86 non-null int64 42 MD 86 non-null int64 43 pos_tag_entrophy 86 non-null float64 44 fleich_reading_ease 86 non-null float64 45 flesch_kincaid_grade 86 non-null float64 46 smog_index 86 non-null float64 47 coleman_liau_index 86 non-null float64 48 automated_readability_index 86 non-null float64 49 dale_chall_readability_score 86 non-null float64 50 difficult_words 86 non-null int64 51 linsear_write_formula 86 non-null float64 52 gunning_fog 86 non-null float64 53 text_standard 86 non-null object 54 fernandez_huerta 86 non-null float64 55 szigriszt_pazos 86 non-null float64 56 gutierrez_polini 86 non-null float64 57 crawford 86 non-null float64 58 gulpease_index 86 non-null float64 59 osman 86 non-null float64 60 level 86 non-null category dtypes: category(1), float64(23), int32(1), int64(31), object(5) memory usage: 40.4+ KB
A2/A2+ 26 A2/A2+,B1 5 B1 28 B1,B2 8 B2 19 Name: level, dtype: int64
24
| level | A2/A2+ | A2/A2+,B1 | B1 | B1,B2 | B2 | |
|---|---|---|---|---|---|---|
| sent_len | mean | 5.491923 | 6.232000 | 5.730000 | 6.196250 | 5.846316 |
| std | 0.689701 | 1.122751 | 0.701517 | 0.449410 | 0.699875 | |
| sent_len_std | mean | 5.198077 | 5.750000 | 5.180714 | 5.133750 | 5.194737 |
| std | 1.249189 | 1.657785 | 1.650582 | 0.597589 | 1.442407 | |
| tok_len | mean | 3.743462 | 3.826000 | 3.756429 | 3.778750 | 3.824737 |
| std | 0.091387 | 0.087920 | 0.087653 | 0.121236 | 0.126375 | |
| tok_len_std | mean | 1.870385 | 2.016000 | 1.919643 | 1.931250 | 1.972105 |
| std | 0.079974 | 0.061887 | 0.072953 | 0.101480 | 0.123629 | |
| lex_div | mean | 0.191231 | 0.165400 | 0.189000 | 0.177875 | 0.187579 |
| std | 0.033519 | 0.021373 | 0.029991 | 0.028713 | 0.037509 | |
| lemma_div | mean | 0.300462 | 0.265400 | 0.301107 | 0.282875 | 0.300158 |
| std | 0.048816 | 0.023330 | 0.043606 | 0.032100 | 0.051426 | |
| tree_height | mean | 2.870769 | 3.044000 | 2.953214 | 3.102500 | 3.010526 |
| std | 0.203390 | 0.255206 | 0.173804 | 0.197611 | 0.202772 | |
| tree_height_std | mean | 1.482308 | 1.606000 | 1.509286 | 1.540000 | 1.524737 |
| std | 0.165343 | 0.110136 | 0.165304 | 0.069488 | 0.143309 | |
| max_tree_height | mean | 10.923077 | 11.600000 | 12.607143 | 11.750000 | 12.105263 |
| std | 2.855494 | 0.894427 | 3.909526 | 2.251983 | 1.370107 | |
| pos_tag_entrophy | mean | 0.809500 | 0.810000 | 0.813107 | 0.809625 | 0.812000 |
| std | 0.016207 | 0.006364 | 0.005050 | 0.007596 | 0.005568 | |
| fleich_reading_ease | mean | 97.016923 | 95.526000 | 97.342857 | 96.607500 | 95.137368 |
| std | 2.418442 | 4.601786 | 1.691431 | 2.963481 | 4.102992 | |
| flesch_kincaid_grade | mean | 1.600000 | 1.920000 | 1.571429 | 1.662500 | 1.831579 |
| std | 0.491528 | 0.887130 | 0.409025 | 0.410357 | 0.629861 | |
| smog_index | mean | 5.996154 | 6.520000 | 6.092857 | 6.137500 | 6.236842 |
| std | 0.349263 | 0.432435 | 0.273426 | 0.512522 | 0.439963 | |
| coleman_liau_index | mean | 2.863846 | 3.516000 | 2.983571 | 3.032500 | 3.255263 |
| std | 0.607085 | 0.781044 | 0.599025 | 0.570507 | 0.756434 | |
| automated_readability_index | mean | 2.442308 | 2.980000 | 2.525000 | 2.612500 | 2.705263 |
| std | 0.462319 | 0.653452 | 0.455115 | 0.559177 | 0.575880 | |
| dale_chall_readability_score | mean | 5.358077 | 5.274000 | 5.400357 | 5.307500 | 5.480526 |
| std | 0.265812 | 0.174442 | 0.220042 | 0.288729 | 0.294759 | |
| linsear_write_formula | mean | 10.575046 | 4.446320 | 5.243355 | 3.918059 | 4.474143 |
| std | 18.220750 | 2.108565 | 9.484014 | 1.146797 | 5.076216 | |
| gunning_fog | mean | 3.661923 | 3.936000 | 3.722500 | 3.722500 | 3.794211 |
| std | 0.447464 | 0.395386 | 0.401586 | 0.264885 | 0.386355 | |
| fernandez_huerta | mean | 126.691923 | 125.498000 | 126.917500 | 126.401250 | 125.402105 |
| std | 1.846372 | 3.529599 | 1.347227 | 2.103239 | 2.958556 | |
| szigriszt_pazos | mean | 124.145385 | 121.686000 | 123.658571 | 123.226250 | 122.860526 |
| std | 1.762950 | 1.730832 | 1.637522 | 1.949234 | 2.362523 | |
| gutierrez_polini | mean | 54.633846 | 53.764000 | 54.427500 | 54.370000 | 53.970526 |
| std | 0.855944 | 0.973566 | 0.795681 | 1.012253 | 1.096259 | |
| crawford | mean | -0.350000 | 0.020000 | -0.282143 | -0.237500 | -0.247368 |
| std | 0.343220 | 0.408656 | 0.370239 | 0.226385 | 0.367224 | |
| gulpease_index | mean | 86.673077 | 83.780000 | 86.435714 | 85.850000 | 86.547368 |
| std | 4.093073 | 4.931734 | 4.349561 | 1.990693 | 3.876621 | |
| osman | mean | 93.055000 | 90.592000 | 92.438571 | 92.348750 | 91.151053 |
| std | 2.442966 | 2.791249 | 2.311714 | 3.068478 | 3.147964 |
Summary
a1_%
a2_%
b1_%
b2_%
c1_%
other_%
tree_height
tree_height_std
max_tree_height
Summary
UH
PRP
VBP
RB
VBG
DT
NNP
NN
CC
WP
JJ
IN
RP
PRP$
NNS
VBZ
VBN
TO
VB
VBD
MD
pos_tag_entrophy
take a look at the tokens which can't be categrozed by ourCEFR dictionary
4563
Requirement already satisfied: statsmodels in c:\users\wang\anaconda3\lib\site-packages (0.13.2) Requirement already satisfied: patsy>=0.5.2 in c:\users\wang\anaconda3\lib\site-packages (from statsmodels) (0.5.2) Requirement already satisfied: numpy>=1.17 in c:\users\wang\anaconda3\lib\site-packages (from statsmodels) (1.21.5) Requirement already satisfied: packaging>=21.3 in c:\users\wang\anaconda3\lib\site-packages (from statsmodels) (21.3) Requirement already satisfied: scipy>=1.3 in c:\users\wang\anaconda3\lib\site-packages (from statsmodels) (1.7.3) Requirement already satisfied: pandas>=0.25 in c:\users\wang\anaconda3\lib\site-packages (from statsmodels) (1.4.1) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\wang\anaconda3\lib\site-packages (from packaging>=21.3->statsmodels) (3.0.4) Requirement already satisfied: pytz>=2020.1 in c:\users\wang\anaconda3\lib\site-packages (from pandas>=0.25->statsmodels) (2021.3) Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\wang\anaconda3\lib\site-packages (from pandas>=0.25->statsmodels) (2.8.2) Requirement already satisfied: six in c:\users\wang\anaconda3\lib\site-packages (from patsy>=0.5.2->statsmodels) (1.16.0) Requirement already satisfied: statsmodels in c:\users\wang\anaconda3\lib\site-packages (0.13.2) Requirement already satisfied: eli5 in c:\users\wang\anaconda3\lib\site-packages (0.13.0) Requirement already satisfied: jinja2>=3.0.0 in c:\users\wang\anaconda3\lib\site-packages (from eli5) (3.1.2) Requirement already satisfied: tabulate>=0.7.7 in c:\users\wang\anaconda3\lib\site-packages (from eli5) (0.8.10) Requirement already satisfied: attrs>17.1.0 in c:\users\wang\anaconda3\lib\site-packages (from eli5) (21.4.0) Requirement already satisfied: six in c:\users\wang\anaconda3\lib\site-packages (from eli5) (1.16.0) Requirement already satisfied: graphviz in c:\users\wang\anaconda3\lib\site-packages (from eli5) (0.20.1) Requirement already satisfied: scikit-learn>=0.20 in c:\users\wang\anaconda3\lib\site-packages (from eli5) (1.0.2) Requirement already satisfied: numpy>=1.9.0 in c:\users\wang\anaconda3\lib\site-packages (from eli5) (1.21.5) Requirement already satisfied: scipy in c:\users\wang\anaconda3\lib\site-packages (from eli5) (1.7.3) Requirement already satisfied: MarkupSafe>=2.0 in c:\users\wang\anaconda3\lib\site-packages (from jinja2>=3.0.0->eli5) (2.0.1) Requirement already satisfied: joblib>=0.11 in c:\users\wang\anaconda3\lib\site-packages (from scikit-learn>=0.20->eli5) (1.1.0) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\wang\anaconda3\lib\site-packages (from scikit-learn>=0.20->eli5) (2.2.0)
features and targets
split data to training and testing, stratify = y
Training samples: 73 Testing samples: 13 Target values of training samples B1 24 A2/A2+ 22 B2 16 B1,B2 7 A2/A2+,B1 4 Name: level, dtype: int64 Target values of testing samples A2/A2+ 4 B1 4 B2 3 A2/A2+,B1 1 B1,B2 1 Name: level, dtype: int64
define function to
define function to
best weighted F1: 0.43823979591836737 with {'clf__C': 2.5999999999999996, 'clf__gamma': 'scale', 'clf__kernel': 'sigmoid'}
For training accuracys: [0.43, 0.48, 0.4, 0.56, 0.56, 0.47, 0.5, 0.56, 0.48, 0.45], Mean accuray: 0.49 fscores: [0.4, 0.44, 0.37, 0.54, 0.52, 0.42, 0.46, 0.51, 0.45, 0.41], Mean fscores: 0.45 For testing accuracys: [0.5, 0.88, 0.62, 0.57, 0.29, 0.43, 0.29, 0.29, 0.57, 0.57], Mean accuray: 0.50 fscores: [0.44, 0.83, 0.53, 0.56, 0.26, 0.34, 0.16, 0.21, 0.53, 0.51], Mean fscores: 0.44
| Weight | Feature |
|---|---|
| 0.0229 ± 0.0184 | linsear_write_formula |
| 0.0220 ± 0.0250 | PRP$ |
| 0.0204 ± 0.0126 | automated_readability_index |
| 0.0203 ± 0.0276 | sent_len |
| 0.0202 ± 0.0299 | RB |
| 0.0178 ± 0.0340 | NNS |
| 0.0158 ± 0.0098 | NNP |
| 0.0155 ± 0.0114 | c1_% |
| 0.0145 ± 0.0190 | tree_height_std |
| 0.0138 ± 0.0330 | VBD |
| 0.0127 ± 0.0000 | NN |
| 0.0127 ± 0.0002 | coleman_liau_index |
| 0.0125 ± 0.0158 | fernandez_huerta |
| 0.0121 ± 0.0282 | b2_% |
| 0.0102 ± 0.0191 | szigriszt_pazos |
| 0.0101 ± 0.0188 | smog_index |
| 0.0099 ± 0.0187 | DT |
| 0.0098 ± 0.0191 | other_% |
| 0.0096 ± 0.0380 | tok_len |
| 0.0096 ± 0.0110 | tree_height |
| … 35 more … | |
['sent_len', 'b2_%', 'c1_%', 'tree_height_std', 'RB', 'NNP', 'NN', 'PRP$', 'NNS', 'VBD', 'smog_index', 'coleman_liau_index', 'automated_readability_index', 'linsear_write_formula', 'fernandez_huerta', 'szigriszt_pazos']
best weighted F1: 0.41553571428571434 with {'clf__C': 1, 'clf__class_weight': 'balanced', 'clf__multi_class': 'multinomial', 'clf__penalty': 'l1', 'clf__solver': 'saga'}
For training accuracys: [0.78, 0.82, 0.75, 0.82, 0.77, 0.77, 0.76, 0.74, 0.79, 0.79], Mean accuray: 0.78 fscores: [0.78, 0.82, 0.75, 0.82, 0.77, 0.77, 0.76, 0.74, 0.79, 0.78], Mean fscores: 0.78 For testing accuracys: [0.38, 0.62, 0.5, 0.29, 0.43, 0.29, 0.57, 0.29, 0.29, 0.71], Mean accuray: 0.44 fscores: [0.4, 0.68, 0.47, 0.21, 0.37, 0.26, 0.54, 0.29, 0.29, 0.66], Mean fscores: 0.42
| Weight | Feature |
|---|---|
| 0.1653 ± 0.0837 | pos_tag_entrophy |
| 0.1607 ± 0.0586 | tok_len_std |
| 0.1361 ± 0.0884 | VBD |
| 0.1219 ± 0.0255 | c1_% |
| 0.0912 ± 0.0458 | b2_% |
| 0.0892 ± 0.0319 | VBN |
| 0.0802 ± 0.0661 | tok_len |
| 0.0619 ± 0.0276 | b1_% |
| 0.0545 ± 0.0008 | JJ |
| 0.0468 ± 0.0359 | NNP |
| 0.0443 ± 0.0230 | lemma_div |
| 0.0433 ± 0.0538 | max_tree_height |
| 0.0423 ± 0.0249 | VB |
| 0.0296 ± 0.0336 | other_% |
| 0.0283 ± 0.0376 | RB |
| 0.0277 ± 0.0243 | PRP$ |
| 0.0251 ± 0.0438 | VBG |
| 0.0200 ± 0.0451 | fleich_reading_ease |
| 0.0183 ± 0.0299 | tree_height |
| 0.0122 ± 0.0497 | gulpease_index |
| … 35 more … | |
best weighted F1: 0.417984693877551 with {'clf__algorithm': 'auto', 'clf__n_neighbors': 7, 'clf__p': 1, 'clf__weights': 'distance'}
For training accuracys: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], Mean accuray: 1.00 fscores: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], Mean fscores: 1.00 For testing accuracys: [0.62, 0.62, 0.5, 0.71, 0.29, 0.43, 0.71, 0.14, 0.14, 0.43], Mean accuray: 0.46 fscores: [0.59, 0.63, 0.39, 0.71, 0.21, 0.4, 0.66, 0.07, 0.08, 0.43], Mean fscores: 0.42
| Weight | Feature |
|---|---|
| 0 ± 0.0000 | sent_len |
| 0 ± 0.0000 | NNP |
| 0 ± 0.0000 | DT |
| 0 ± 0.0000 | VBG |
| 0 ± 0.0000 | RB |
| 0 ± 0.0000 | VBP |
| 0 ± 0.0000 | PRP |
| 0 ± 0.0000 | UH |
| 0 ± 0.0000 | max_tree_height |
| 0 ± 0.0000 | tree_height_std |
| 0 ± 0.0000 | tree_height |
| 0 ± 0.0000 | CC |
| 0 ± 0.0000 | uniq_tok_cnt |
| 0 ± 0.0000 | other_% |
| 0 ± 0.0000 | tok_len |
| 0 ± 0.0000 | sent_len_std |
| 0 ± 0.0000 | lemma_div |
| 0 ± 0.0000 | sent_cnt |
| 0 ± 0.0000 | NN |
| 0 ± 0.0000 | osman |
| … 35 more … | |
best weighted F1: 0.426875 with {'max_depth': 5, 'max_features': 'auto', 'min_samples_leaf': 4, 'min_samples_split': 2, 'n_estimators': 100, 'random_state': 12345}
{'max_depth': 5,
'max_features': 'auto',
'min_samples_leaf': 4,
'min_samples_split': 2,
'n_estimators': 100,
'random_state': 12345}
For training accuracys: [0.86, 0.83, 0.8, 0.85, 0.82, 0.8, 0.85, 0.89, 0.85, 0.86], Mean accuray: 0.84 fscores: [0.84, 0.81, 0.76, 0.82, 0.79, 0.76, 0.84, 0.88, 0.82, 0.84], Mean fscores: 0.82 For testing accuracys: [0.38, 0.62, 0.5, 0.57, 0.29, 0.43, 0.71, 0.14, 0.29, 0.43], Mean accuray: 0.44 fscores: [0.34, 0.6, 0.39, 0.52, 0.22, 0.34, 0.66, 0.1, 0.26, 0.43], Mean fscores: 0.39
| Weight | Feature |
|---|---|
| 0.0618 ± 0.0286 | tree_height |
| 0.0307 ± 0.0458 | VBG |
| 0.0235 ± 0.0187 | RB |
| 0.0233 ± 0.0293 | tok_len_std |
| 0.0221 ± 0.0319 | VBD |
| 0.0202 ± 0.0283 | PRP |
| 0.0137 ± 0.0151 | b2_% |
| 0.0129 ± 0.0000 | UH |
| 0.0129 ± 0.0001 | tok_cnt |
| 0.0125 ± 0.0000 | WP |
| 0.0123 ± 0.0312 | pos_tag_entrophy |
| 0.0087 ± 0.0123 | tok_len |
| 0.0078 ± 0.0125 | a1_% |
| 0.0077 ± 0.0126 | NNS |
| 0.0068 ± 0.0223 | difficult_words |
| 0.0067 ± 0.0247 | max_tree_height |
| 0.0057 ± 0.0126 | RP |
| 0.0054 ± 0.0121 | coleman_liau_index |
| 0.0052 ± 0.0125 | VBZ |
| 0.0052 ± 0.0126 | gunning_fog |
| … 35 more … | |
selected features
{'JJ',
'NN',
'NNP',
'NNS',
'PRP',
'PRP$',
'RB',
'UH',
'VB',
'VBD',
'VBG',
'VBN',
'WP',
'automated_readability_index',
'b1_%',
'b2_%',
'c1_%',
'coleman_liau_index',
'difficult_words',
'fernandez_huerta',
'fleich_reading_ease',
'gulpease_index',
'lemma_div',
'linsear_write_formula',
'max_tree_height',
'other_%',
'pos_tag_entrophy',
'sent_len',
'smog_index',
'szigriszt_pazos',
'tok_cnt',
'tok_len',
'tok_len_std',
'tree_height',
'tree_height_std'}
best weighted F1: 0.4418452380952381 with {'clf__C': 1.0, 'clf__gamma': 'scale', 'clf__kernel': 'rbf'}
Accuracy score: 0.38
| A2/A2+ | A2/A2+,B1 | B1 | B1,B2 | B2 | weighted_avg | |
|---|---|---|---|---|---|---|
| precision | 0.500000 | 0.0 | 0.400000 | 0.0 | 0.333333 | 0.353846 |
| recall | 0.250000 | 0.0 | 0.500000 | 0.0 | 0.666667 | 0.384615 |
| fscore | 0.333333 | 0.0 | 0.444444 | 0.0 | 0.444444 | 0.341880 |
| support | 4.000000 | 1.0 | 4.000000 | 1.0 | 3.000000 | NaN |
best weighted F1: 0.4195068027210884 with {'clf__algorithm': 'auto', 'clf__n_neighbors': 10, 'clf__p': 1, 'clf__weights': 'distance'}
Accuracy score: 0.54
| A2/A2+ | A2/A2+,B1 | B1 | B1,B2 | B2 | weighted_avg | |
|---|---|---|---|---|---|---|
| precision | 0.666667 | 0.0 | 0.50 | 0.0 | 0.500000 | 0.474359 |
| recall | 0.500000 | 0.0 | 0.75 | 0.0 | 0.666667 | 0.538462 |
| fscore | 0.571429 | 0.0 | 0.60 | 0.0 | 0.571429 | 0.492308 |
| support | 4.000000 | 1.0 | 4.00 | 1.0 | 3.000000 | NaN |
best weighted F1: 0.42688350340136055 with {'max_depth': None, 'max_features': 'auto', 'min_samples_leaf': 4, 'min_samples_split': 2, 'n_estimators': 200, 'random_state': 12345}
{'max_depth': None,
'max_features': 'auto',
'min_samples_leaf': 4,
'min_samples_split': 2,
'n_estimators': 200,
'random_state': 12345}
Accuracy score: 0.54
| A2/A2+ | A2/A2+,B1 | B1 | B1,B2 | B2 | weighted_avg | |
|---|---|---|---|---|---|---|
| precision | 0.666667 | 0.0 | 0.50 | 0.0 | 0.500000 | 0.474359 |
| recall | 0.500000 | 0.0 | 0.75 | 0.0 | 0.666667 | 0.538462 |
| fscore | 0.571429 | 0.0 | 0.60 | 0.0 | 0.571429 | 0.492308 |
| support | 4.000000 | 1.0 | 4.00 | 1.0 | 3.000000 | NaN |
| Dep. Variable: | level | Log-Likelihood: | -59.047 |
|---|---|---|---|
| Model: | OrderedModel | AIC: | 196.1 |
| Method: | Maximum Likelihood | BIC: | 285.4 |
| Date: | Fri, 29 Jul 2022 | ||
| Time: | 14:24:26 | ||
| No. Observations: | 73 | ||
| Df Residuals: | 34 | ||
| Df Model: | 39 |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| x1 | -4.4566 | 1.980 | -2.251 | 0.024 | -8.337 | -0.576 |
| x2 | 0.6498 | 0.539 | 1.205 | 0.228 | -0.407 | 1.707 |
| x3 | 4.0730 | 1.803 | 2.259 | 0.024 | 0.538 | 7.607 |
| x4 | 0.5044 | 1.040 | 0.485 | 0.628 | -1.534 | 2.543 |
| x5 | -1.7257 | 0.758 | -2.278 | 0.023 | -3.211 | -0.241 |
| x6 | 0.8341 | 0.503 | 1.658 | 0.097 | -0.152 | 1.820 |
| x7 | -0.0227 | 0.871 | -0.026 | 0.979 | -1.731 | 1.685 |
| x8 | 0.4895 | 0.341 | 1.436 | 0.151 | -0.179 | 1.158 |
| x9 | 1.5773 | 1.106 | 1.425 | 0.154 | -0.591 | 3.746 |
| x10 | -1.1327 | 0.663 | -1.708 | 0.088 | -2.432 | 0.167 |
| x11 | -0.2056 | 0.354 | -0.582 | 0.561 | -0.898 | 0.487 |
| x12 | -0.0741 | 1.257 | -0.059 | 0.953 | -2.538 | 2.390 |
| x13 | -0.2864 | 0.795 | -0.360 | 0.719 | -1.845 | 1.272 |
| x14 | -0.9263 | 0.591 | -1.566 | 0.117 | -2.085 | 0.233 |
| x15 | -0.0761 | 1.178 | -0.065 | 0.949 | -2.384 | 2.232 |
| x16 | 1.9060 | 0.598 | 3.187 | 0.001 | 0.734 | 3.078 |
| x17 | 1.4269 | 0.397 | 3.598 | 0.000 | 0.650 | 2.204 |
| x18 | 2.5512 | 0.834 | 3.059 | 0.002 | 0.916 | 4.186 |
| x19 | 0.4153 | 0.777 | 0.535 | 0.593 | -1.107 | 1.938 |
| x20 | -0.2403 | 0.622 | -0.386 | 0.699 | -1.460 | 0.979 |
| x21 | 8.7552 | 9.940 | 0.881 | 0.378 | -10.728 | 28.238 |
| x22 | 0.9560 | 1.806 | 0.529 | 0.597 | -2.585 | 4.497 |
| x23 | 1.5971 | 0.880 | 1.815 | 0.070 | -0.128 | 3.322 |
| x24 | -0.3622 | 0.894 | -0.405 | 0.685 | -2.114 | 1.390 |
| x25 | -3.2238 | 1.384 | -2.329 | 0.020 | -5.936 | -0.511 |
| x26 | -10.5873 | 10.605 | -0.998 | 0.318 | -31.372 | 10.197 |
| x27 | 2.6472 | 1.878 | 1.410 | 0.159 | -1.033 | 6.328 |
| x28 | 1.0480 | 0.569 | 1.842 | 0.065 | -0.067 | 2.163 |
| x29 | 1.3284 | 0.770 | 1.724 | 0.085 | -0.181 | 2.838 |
| x30 | -1.0771 | 0.697 | -1.546 | 0.122 | -2.442 | 0.288 |
| x31 | 3.6436 | 4.658 | 0.782 | 0.434 | -5.485 | 12.772 |
| x32 | -0.6734 | 0.542 | -1.242 | 0.214 | -1.736 | 0.390 |
| x33 | -1.9469 | 1.240 | -1.570 | 0.116 | -4.377 | 0.483 |
| x34 | -0.0381 | 1.129 | -0.034 | 0.973 | -2.251 | 2.175 |
| x35 | 0.5837 | 1.714 | 0.341 | 0.733 | -2.775 | 3.942 |
| A2/A2+/A2/A2+,B1 | -1.1735 | 0.273 | -4.294 | 0.000 | -1.709 | -0.638 |
| A2/A2+,B1/B1 | -1.1064 | 0.478 | -2.313 | 0.021 | -2.044 | -0.169 |
| B1/B1,B2 | 0.6221 | 0.186 | 3.344 | 0.001 | 0.257 | 0.987 |
| B1,B2/B2 | -0.2711 | 0.345 | -0.787 | 0.431 | -0.946 | 0.404 |
accuracy score 0.54
[NbConvertApp] Converting notebook 100_films.ipynb to html [NbConvertApp] Writing 4886843 bytes to 100_films.html